What lexical sets tell us about conceptual categories
نویسندگان
چکیده
It is common practice in computational linguistics to attempt to use selectional constraints and semantic type hierarchies as primary knowledge resources to perform word sense disambiguation (cf. Jurafsky and Martin 2000). The most widely adopted methodology is to start from a given ontology of types (e.g. Wordnet, cf. Miller and Fellbaum 2007) and try to use its implied conceptual categories to specify the combinatorial constraints on lexical items. Semantic Typing information about selectional preferences is then used to guide the induction of senses for both nouns and verbs in texts. Practical results have shown, however, that there are a number of problems with such an approach. For instance, as corpus-driven pattern analysis shows (cf. Hanks et al. 2007), the paradigmatic sets of words that populate specific argument slots within the same verb sense do not map neatly onto conceptual categories, as they often include words belonging to different types. Also, the internal composition of these sets changes from verb to verb, so that no stable generalization seems possible as to which lexemes belong to which semantic type (cf. Hanks and Jezek 2008). In this paper, we claim that these are not accidental facts related to the contingencies of a given ontology, but rather the result of an attempt to map distributional language behaviour onto semantic type systems that are not sufficiently grounded in real corpus data. We report the efforts done within the CPA project (cf. Hanks 2009) to build an ontology which satisfies such requirements and explore its advantages in terms of empirical validity over more speculative ontologies.
منابع مشابه
Modelling speech production – evidence from Swedish blends
This paper is concerned with a type of speech errors called blends and what they may tell us about speech processing. A blend is a contamination of elements from two different lexical items. Sometime during the process of retrieving a lexical item for output, two functionally synonymous items that compete for the same slot are fused and realised as one separate word or nonce word (Levelt 1989: ...
متن کاملWhat do category-specific semantic deficits tell us about the representation of lexical concepts?
A reassessment of category-specific semantic deficits in light of their contribution to a theory of the representation of lexical concepts is proposed. Two theories are examined: one, held by the majority of researchers in the field, claims that concepts are represented by sets of features; another, in contrast, claims that concepts are atomic representations. An analysis of category-specific s...
متن کاملWhat Substitutes Tell Us - Analysis of an "All-Words" Lexical Substitution Corpus
We present the first large-scale English “allwords lexical substitution” corpus. The size of the corpus provides a rich resource for investigations into word meaning. We investigate the nature of lexical substitute sets, comparing them to WordNet synsets. We find them to be consistent with, but more fine-grained than, synsets. We also identify significant differences to results for paraphrase r...
متن کاملWhat Can Tone Studies Tell Us about Intonation?
The present paper demonstrates that much can be learned about intonation through the study of the contribution of lexical tones to the f0 contour of speech utterances. A fundamental principle and several basic mechanisms of tone production and perception are proposed based on studies of both tone and intonation. Their implications for intonation in general are discussed.
متن کاملSocial Group Stories in the Media and Child Development.
How do children and youth come to understand what it means to be a member of a particular race, gender, and other social groups? How do they come to hold beliefs about the groups that they do and do not belong to? Both news stories and fictional narratives that we are tuned into as a culture tell stories about what it means to be a member of a particular social group. In this review article, we...
متن کامل